Monitoring the Dynamic Web to respond to Continuous Queries
ثبت نشده
چکیده
Continuous queries are queries for which responses given to users must be continuously updated, as the sources of interest get updated. Such queries occur, for instance, during on-line decision making, e.g., traffic flow control, weather monitoring, etc. The problem of keeping the responses current reduces to the problem of deciding how often to visit a source to determine if and how it has been modified so that a user response can be updated accordingly. On the surface, this seems to be similar to the crawling problem since crawlers attempt to keep indexes up-to-date as users pose search queries. We show that this is not the case, both due to the inherent differences between the nature of the two problems as well as the performance metric. We also develop and evaluate a multiphase solution to the problem. Some of the important phases are: The monitoring phase, in which changes, to an initially identified set of relevant pages, are tracked. From the observed change characteristics of these pages, a probabilistic model of their change behaviour is formulated and weights are assigned to pages to denote their importance for the current queries. During the next phase, the Resource Allocation phase, based on these statistics, resources, needed to continuously probe these pages for changes, are allocated. Given these resource allocations, the scheduling phase produces an optimal achievable schedule for the probings. An experimental evaluation of our approach compared to prior approaches for crawling dynamic web pages leads to some interesting observations pertaining to the differences between the two problem of crawling—to build an index—and the problem of change tracking— to respond to continuous queries.
منابع مشابه
Monitoring the Dynamic Web to respond to Continuous Queries
Continuous queries are queries for which responses given to users must be continuously updated, as the sources of interest get updated. Such queries occur, for instance, during on-line decision making, e.g., traffic flow control, weather monitoring, etc. The problem of keeping the responses current reduces to the problem of deciding how often to visit a source to determine if and how it has bee...
متن کاملارائه روشی پویا جهت پاسخ به پرسوجوهای پیوسته تجمّعی اقتضایی
Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...
متن کاملAnalysis of users’ query reformulation behavior in Web with regard to Wholis-tic/analytic cognitive styles, Web experience, and search task type
Background and Aim: The basic aim of the present study is to investigate users’ query reformulation behavior with regard to wholistic-analytic cognitive styles, search task type, and experience variables in using the Web. Method: This study is an applied research using survey method. A total of 321 search queries were submitted by 44 users. Data collection tools were Riding’s Cognitive Style A...
متن کاملMonitoring the Dynamic Web to respond to Continuous Queries: A Demonstration
Our Continuous Adaptive Monitoring (CAM) system provides responses for continuous queries by monitoring and extracting information scattered across the web. Continuous queries are the queries for which responses given to users must be continuously updated, as the sources of interest get updated. Such queries occur, for instance, during on-line decision making, e.g., traffic flow control, weathe...
متن کاملمدل جدیدی برای جستجوی عبارت بر اساس کمینه جابهجایی وزندار
Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...
متن کامل